首页> 外文OA文献 >Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
【2h】

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

机译:利用Deep实现室内场景中的目标驱动视觉导航   强化学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Two less addressed issues of deep reinforcement learning are (1) lack ofgeneralization capability to new target goals, and (2) data inefficiency i.e.,the model requires several (and often costly) episodes of trial and error toconverge, which makes it impractical to be applied to real-world scenarios. Inthis paper, we address these two issues and apply our model to the task oftarget-driven visual navigation. To address the first issue, we propose anactor-critic model whose policy is a function of the goal as well as thecurrent state, which allows to better generalize. To address the second issue,we propose AI2-THOR framework, which provides an environment with high-quality3D scenes and physics engine. Our framework enables agents to take actions andinteract with objects. Hence, we can collect a huge number of training samplesefficiently. We show that our proposed method (1) converges faster than thestate-of-the-art deep reinforcement learning methods, (2) generalizes acrosstargets and across scenes, (3) generalizes to a real robot scenario with asmall amount of fine-tuning (although the model is trained in simulation), (4)is end-to-end trainable and does not need feature engineering, feature matchingbetween frames or 3D reconstruction of the environment. The supplementary video can be accessed at the following link:https://youtu.be/SmBxMDiOrvs.
机译:深度强化学习的两个未得到解决的问题是(1)缺乏对新目标目标的概括能力,以及(2)数据效率低下,即该模型需要多次(且往往是昂贵的)试验和错误集才能收敛,这使其难以实现。适用于实际场景。在本文中,我们解决了这两个问题,并将我们的模型应用于目标驱动的视觉导航任务。为了解决第一个问题,我们提出了一个行为者批评模型,该模型的政策是目标以及当前状态的函数,可以更好地进行概括。为了解决第二个问题,我们提出了AI2-THOR框架,该框架为环境提供了高质量的3D场景和物理引擎。我们的框架使代理能够采取行动并与对象进行交互。因此,我们可以有效地收集大量的训练样本。我们证明了我们提出的方法(1)的融合速度比最新的深度强化学习方法快;(2)概括了跨目标和跨场景的内容;(3)概括了具有少量微调的真实机器人场景(尽管模型是在模拟中训练的),但(4)是端到端可训练的,不需要特征工程,框架之间的特征匹配或环境的3D重建。可以从以下链接访问补充视频:https://youtu.be/SmBxMDiOrvs。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号